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Abstract 

Empirical research with electricity transmission networks reliability 
data shows that the size of major failures - in terms of energy not 
supplied (ENS), total loss of power (TLP) or restoration time (RT) - 
appear to follow a power law behaviour in the upper tail of the dis- 
tribution. However, this pattern - also known as Pareto distribution 
- is not valid in the whole range of those major events. We aimed 
to find a probability distribution that we could use to model them, 
and hypothesized that there is a two-parameter model that fits the 
pattern of those data well in the entire domain. We considered the 
major failures produced between 2002 and 2009 in the European power 
grid; analyzed those reliability indicators: ENS, TLP and RT; fitted 
six alternative models: Pareto II, Fisk, Lognormal, Pareto, Weibull 
and Gamma distributions, to the data by maximum likelihood; com- 
pared these models by the Bayesian information criterion; tested the 
goodness-of-fit of those models by a Kolmogorov-Smirnov test method 
based on bootstrap resampling; and validated them graphically by 
rank-size plots. We found that Pareto II distribution is, in the case of 
ENS and TLP, an adequate model to describe major events reliability 
data of power grids in the whole range, and in the case of RT, is the 
best choice of the six alternative models analyzed. 
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1 Introduction 



Electricity transmission networks provide the means to transport the elec- 
tricy from the power plants, where is produced, to the distribution networks, 
near our homes and businesses. Unfortunately, failures in these systems do 
happen - and nowadays, electricity is essential for all of us. For that reason, 
the analysis of those failure events, in particular from a statistical point of 
view, is crucial to improve the reliability of those transmission infrastructures 

in- 

In this direction, some promising results have been obtained using network 
reliability data from major events: the number of customers affected by 
electrical blackouts in the United States between 1984 and 2002 [2]; the 
energy not supplied, the total loss of power and the restoration time in the 
European power grid between 2002 and 2008 [3j, all can be fitted by a power 
law distribution (also known as Pareto distribution [4J [5]) in the upper tail 
of the distribution. 

However, this power law behaviour is not valid in the whole range of 
those datasets analyzed. The number of observations included in the power- 
law upper tail is small. As examples, only the 15% of the major events for 
energy not supplied and less than 10% of the major events for total loss of 
power datasets mentioned [3] follow that power law behaviour. 

The aim of this study was to find a probability distribution that we could 
use to model major events reliability data of electricity transmission networks 
in the whole range. Our primary hipothesis was that there is a two-parameter 
model that fits the pattern of data well - following the principle of parsimony 
and admitting more than two parameters only if necessary. The rest of this 
paper is organized as follows: in Section^ we introduce the datasets analyzed 
and the method used; the results are presented and discussed afterwards in 
Section [3j finally, the conclusions are in Section HI 

2 Data and Methods 

We considered the network reliability data from Union for Co-ordination 
of Transmission of Electricity (UCTE) [6] - in 2007 as a reference, an asso- 
ciation of 29 transmission system operators of 24 european countries, with 
an installed capacity of 640 GW, an electricity consumption of 2600 TWh, 
a length of high-voltage transmission lines managed of 220000 km and 500 
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million people served. In 2009, all UCTE operation tasks were transferred 
to the European Network of Transmission System Operators for Electricity 
(ENTSO-E) [7j. Data considered correspond to a random sample of major 
events, between 2002 and 2009, with Energy not Supplied (ENS) given in 
MWh, Total Loss of Power (TLP) given in MW, and Restoration Time (RT) 
given in minutes, and where zero values have not been considered. This 
dataset was described before in [3], contains 698 major events, and can be 
found in [8]. Table [j] shows the main empirical characteristics of ENS, TLP 
and RT. 



Table 1: Main empirical characteristics of ENS, TLP and RT, from major events of UCTE 
electricity transmission network, between 2002 and 2009. 





n 


Mean 


Std. Dev. 


Skcwness 


Kurtosis 


Min. 


Max. 


ENS (MWh) 


583 


631.17 


7133.86 


22.31 


521.74 


1 


168000 


TLP (MW) 


528 


374.41 


1431.23 


12.05 


178.26 


1 


24120 


RT (minutes) 


689 


493.36 


3290.20 


10.88 


134.03 


1 


50432 



We fitted and compared six models with two parameters: the Pareto II 
distribution (also known as Lomax distribution) (51 19], the Fisk distribution 
(also known as Log-logistic distribution) [TO], the Lognormal distribution 
[TT] . the Pareto (Power law) distribution, the Weibull distribution (12] and 
the Gamma distribution [13]. Table [2] shows the cumulative distribution 
functions F(x) and the probability density functions f(x) of these six distri- 
butions. 

First, we fitted all six models by maximum likelihood [TSJ. For each model, 
the log-likelihood function is given by, 

n 

log*(0|a;) = 5>g/(z i |0), (1) 

i=l 

where 6 is the unknown parameter vector of the model, x is the sample data, 
f(x) is its probability density function showed in Table [2j and the maximum 
likelihood estimation of the parameter vector 9 is the one that maximizes the 
likelihood function \og£(9\x). 

Then, we compared those models using the following model selection cri- 
teria: the Akaike information criterion (AIC), defined by [T%] 

AlC = -2 log L + 2d; (2) 
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Table 2: Cumulative distribution functions and probability density functions used; 
"f(a,x/a) represents the lower incomplete gamma function. 
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and the Bayesian information criterion (BIC), defined by pj) 



BIC = logL — -cHogn; 



(3) 



where logL = log^(^|x) is the log-likelihood (see Eq. [Q of the model eval- 
uated at the maximum likelihood estimates, d is the number of parameters, 
n is the number of data, and the model chosen is the one with the smallest 
value of AIC statistic or with the largest value of BIC statistic. 

After that, we tested the goodness-of-fit of all the six models considered 
by a Kolmogorov-Smirnov (KS) test method based on bootstrap resampling 
[21 dSl [TO HB]. Let x±, x<2, . . . , x n be the sample of X and 



B<n. ( Xi 



11 



n 



Xj 



be the empirical cumulative distribution function (cdf ) in a sample value with 
the indicated plotting position formula [20]. Let F(x;9) be the theoretical 
cdf of a particular model fitted by maximum likelihood. The KS statistic of 
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the model is given by [BJJ |2T] 

D n = sup \F n (xi) - F(xi;9)\, i = l,2,...,n, (4) 

and the null hypothesis to test is H : the data follow that model. Then, 
for each model, the procedure is as follows: (1) calculate the empirical KS 
statistic for the observed data; (2) generate, by simulation, enough synthetic 
data sets (in this study, we generated 10000 data sets), with the same sample 
size n as the observed data - if U is uniformly distributed on [0, 1] and 
Q(p, 6) is the theoretical quantile function of the model, then Q(U,9) has that 
model distribution; (3) fit each synthetic data set by maximum likelihood and 
obtained its theoretical cdf; (4) calculate the KS statistic for each synthetic 
data set - with its own theoretical cdf; (5) calculate the p- value as the fraction 
of synthetic data sets with a KS statistic greater than the empirical KS 
statistic; (6) null hypothesis can be rejected with the 0.05 level of significance 
if p- value < 0.05. 

Finally, as a graphical model validation, we used a rank-size plot (on a 
log-log scale). Let x^ < x^) < • ■ ■ < X( n ) be the ordered sample of X, we 
considered the scatter plot of the points (observed data) 

\og[ranki] versus log[x(j)], % — 1, 2, . . . , n, (5) 

where ranki = n + 1 — i = (n + 1)(1 — F n (x(j\), plotted it together with the 
complementary of the theoretical cdf of the model multiplied by (n + 1) 

log[(n+ 1)(1 - F(x^;6)] versus log[x (i) ], i = l,2, . . . , n, (6) 

and evaluated graphically how well the model fitted the observed data. 

3 Results and Discussion 

Tables I3H41 show the parameter estimates and their standard errors from 
the six alternative models considered: the Pareto II distribution (a and a 
parameters); the Fisk distribution (f3 and a parameters); the Lognormal 
distribution (/i and a parameters); the Pareto (power law) distribution (a 
and a parameters); the Weibull distribution (/3 and A parameters) and the 
Gamma distribution (a and a parameters); fitted to the Energy not Supplied 
(ENS), Total Loss of Power (TLP) and Restoration Time (RT) datasets in 
the whole range, by maximum likelihood. 
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Table shows the values of BIC statistic (Eq. [3D, obtained from the 
six candidate models, corresponding to ENS, TLP and RT datasets in the 
whole range. Pareto II model presents the largest value of BIC in ENS 
and RT datasets, followed by the Fisk and Lognormal distribution. With 
respect to TLP dataset, Fisk, Pareto II and Lognormal models present the 
largest values of BIC - these three results are very similar and slightly better 
for Fisk model. Therefore, Pareto II is the preferable model in ENS and RT 
datasets; and Pareto II, Fisk and Lognormal models are the preferable models 
in TLP dataset, according to Bayesian information criterion - denote that 
AIC statistics (Eq. |5J) provide, in this case, equivalent results to these for 
the BIC statistics. 

Tables 151171 show, respectively, the values of Kolmogorov-Smirnov (KS) 
statistic (Eq. H]) and the p- values obtained by bootstrap resampling, from the 
six alternative models analyzed, corresponding to ENS, TLP and RT datasets 
in the entire domain. With respect to ENS dataset, the null hypothesis Hq 
for Pareto II model cannot be rejected (p-value = 0.0727 > 0.05) and for 
the rest five models (Fisk, Lognormal, Pareto, Weibull and Gamma) can 
be rejected (p-value < 0.05) at the 0.05 level of significance. In the case 
of TLP dataset, Hq for Pareto II, Fisk and Log-normal models cannot be 
rejected and for Pareto, Weibul and Gamma can be rejected at the 0.05 level 
of significance. Finally, H for all the six models can be rejected at the 0.05 
level of significance in the case of RT dataset. 

Rank-size plots fl5]|p]) corresponding to major events between 2002 and 
2009, in the whole range, of Energy not Supplied (ENS, in MWh), Total Loss 
of Power (TLP, in MW) and Restoration Time (RT, in minutes) datasets, 
show graphically (see figure [[]): the adequacy of the Pareto II model to 
the ENS dataset in contrast to the Fisk and Lognormal distributions; the 
adequacy of the Pareto II, Fisk and Lognormal models to the TLP dataset; 
and the best description of the RT dataset given by the Pareto II model in 
comparison with Fisk and Lognormal models. 

In summary, according to the results obtained, Pareto II distribution may 
serve as an adequate model for Energy Not Supplied and Total Loss of Power 
data from major failures in Electricity Transmission Networks in the entire 
domain. Adittionally, Pareto II distribution fits reasonably well Restoration 
Time data but with some deviation, improving other alternative models such 
as Fisk, Lognormal, Pareto, Weibull and Gamma distribution - unfortunately, 
this deviation is statistically significant. Note that Pareto II distribution is 
a shifted power law distribution, which turns into a Pareto distribution for 
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Tabic 3: Parameter estimates from the Pareto II, Fisk and Lognormal models to the ENS, 
TLP and RT datasets by maximum likelihood (standard errors in parenthesis). 



Data Set 



Pareto II Fisk Lognormal 



6 


a 


$ 


a 


A 


a 


0.6445 


10.578 


0.8678 


21.787 


3.2351 


2.0546 


(0.0428) 


(1.4033) 


(0.0299) 


(1.8136) 


(0.0851) 


(0.0602) 


1.1953 


115.56 


1.0787 


89.034 


4.4894 


1.6495 


(0.1146) 


(18.127) 


(0.0390) 


(6.2531) 


(0.0718) 


(0.0508) 


0.7768 


17.896 


0.9819 


26.210 


3.4172 


1.8521 


(0.0499) 


(2.1207) 


(0.0312) 


(1.7666) 


(0.0706) 


(0.0499) 



ENS 
TLP 
RT 



Table 4: Parameter estimates from the Pareto, Weibull and Gamma models to the ENS, 
TLP and RT datasets by maximum likelihood (standard errors in parenthesis). 



Data Set 



Pareto Weibull Gamma 



a 


a 




A 


Q 


a 


0.3091 


1.0000 


0.4128 


75.802 


0.2249 


2806.5 


(0.0154) 


(0.0896) 


(0.0113) 


(8.0903) 


(0.0102) 


(276.05) 


0.2227 


1.0000 


0.5930 


203.14 


0.4497 


832.47 


(0.0110) 


(0.1046) 


(0.0179) 


(15.815) 


(0.0226) 


(68.357) 


0.2926 


1.0000 


0.4398 


82.772 


0.2544 


1939.0 


(0.0133) 


(0.0837) 


(0.0109) 


(7.6332) 


(0.0107) 


(167.55) 



ENS 
TLP 
RT 



Table 5: BIC statistics for six candidate models, fitted for ENS, TLP and RT datasets in 
the entire domain. Larger values indicate better fitted models. 



Data Set 


Pareto II 


Fisk 


Lognormal 


Pareto 


Weibull 


Gamma 


ENS 


-3125.2 


-3137.4 


-3139.5 


-3159.9 


-3254.3 


-3456.6 


TLP 


-3389.7 


-3389.4 


-3390.1 


-3697.6 


-3438.7 


-3502.7 


RT 


-3744.0 


-3751.4 


-3763.3 


-3896.7 


-3918.7 


-4138.9 
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Tabic 6: Empirical KS statistics for the six candidate models in the entire domain of the 
ENS, TLP and RT datasets. 



Data Set 


Pareto II 


Fisk 


Lognormal 


Pareto 


Weibull 


Gamma 


ENS 


0.0323 


0.0447 


0.0695 


0.1642 


0.1150 


0.2588 


TLP 


0.0266 


0.0240 


0.0213 


0.3348 


0.0755 


0.1481 


RT 


0.0402 


0.0522 


0.0664 


0.2131 


0.1335 


0.2692 



Table 7: Bootstrap p- values for the six candidate models in the entire domain of the ENS, 
TLP and RT datasets. Values of p < 0.05 indicate that the models can be ruled out with 
the 0.05 level of significance. 



Data Set 


Pareto II 


Fisk 


Lognormal 


Pareto 


Weibull 


Gamma 


ENS 


0.0727 


0.0004 


0.0000 


0.0000 


0.0000 


0.0000 


TLP 


0.3640 


0.4522 


0.2720 


0.0000 


0.0000 


0.0000 


RT 


0.0013 


0.0000 


0.0000 


0.0000 


0.0000 


0.0000 



large values of the variable [22], following the known power law behaviour in 
the upper tail, and has only two parameters which means simplicity. For all 
of that, we think that Pareto II (Lomax) distribution is a good alternative 
for modelling power grid reliability data, in the entire domain of the major 
events. 

4 Conclusions 

We found a two parameter probability distribution that we can use to 
model major events reliability data of electricity transmission networks in the 
entire domain: the Pareto II distribution - also known as Lomax distribution. 

Pareto II model fits very well the pattern of Energy not Supplied (ENS) 
and Total Loss of Power (TLP) data and is the best of the six models con- 
sidered for Restoration Time (RT) data. Additionaly, we found other two 
models with two parameters: the Fisk (also known as Log-logistic distribu- 
tion) and the Lognormal distributions, adequate especifically for Total Loss 
Power data. 

We considered the major failures produced between 2002 and 2009 in the 
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European power grid operated by UCTE; analyzed three reliability indica- 
tors: ENS, TLP and RT; fitted six alternative models: Pareto II, Fisk, Log- 
normal, Pareto (PowerLaw), Weibull and Gamma distributions, to the data 
by maximum likelihood; compared these models by the Bayesian informa- 
tion criterion; tested the goodness-of-fit of those models by a Kolmogorov- 
Smirnov test method based on bootstrap resampling; and validated them 
graphically by rank-size plots. 

Future work is needed to find a better model for Restoration Time data 
from major failures in power grids in the entire domain - with two parameters 
or three parameters if necessary. 

Previous empirical research has shown that Pareto (power law) distribu- 
tion is an adequate model to describe major events reliability data of elec- 
tricity transmission networks in the upper tail. In this study we found that 
Pareto II distribution - a shifted power law distribution - is a better choice 
to describe major events reliability data of electricity transmission networks 
in the entire domain. 
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Figure 1: Rank-size plots of the complementary of the cdf multiplied by n + 1 (solid lines) 
of the Pareto II (Pa II), Fisk (Fk) and Lognormal (Ln) distributions and the observed 
data, on log-log scale. Left: Pareto II model. Right: Fisk and Lognormal models. Data: 
energy not supplied (ENS), total loss of power (TLP) and restoration time (RT), from 
european power grid major events in the entire domain, in the period 2002-2009. 
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